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MINIMAX RATES OF COMMUNITY DETECTION IN 
STOCHASTIC BLOCK MODELS 

By Anderson Y. Zhang, and Harrison H. Zhou 

Yale University 

Recently network analysis has gained more and more attentions 
in statistics, as well as in computer science, probability, and applied 
mathematics. Community detection for the stochastic block model 
(SBM) is probably the most studied topic in network analysis. Many 
methodologies have been proposed. Some beautiful and significant 
phase transition results are obtained in various settings. In this pa¬ 
per, we provide a general minimax theory for community detection. It 
gives minimax rates of the mis-match ratio for a wide rage of settings 
including homogeneous and inhomogeneous SBMs, dense and sparse 
networks, hnite and growing number of communities. The minimax 
rates are exponential, different from polynomial rates we often see in 
statistical literature. An immediate consequence of the result is to es¬ 
tablish threshold phenomenon for strong consistency (exact recovery) 
as well as weak consistency (partial recovery). We obtain the upper 
bound by a range of penalized likelihood-type approaches. The lower 
bound is achieved by a novel reduction from a global mis-match ratio 
to a local clustering problem for one node through an exchangeability 
property. 


1. Introduction. Network science [10, 23, 28, 17] has become one of 
the most active research areas over the past few years. It has applications in 
many disciplines, for example, physics [24], sociology [29], biology [4], and 
Internet [2]. Detecting and identifying communities is fundamentally impor¬ 
tant to understand the underlying structure of the network [12]. Many mod¬ 
els and methodologies have been proposed for community detection from dif¬ 
ferent perspectives, including RatioCut[13], Ncut [26], and spectral method 
[19, 25, 16] from computer science, Newman-Girvan Modularity [12] from 
physics, semi-definite programming [7, 14] from engineering, and maximum 
likelihood estimation [3, 6] from statistics. 

Deep theoretical developments have been actively pursued as well. Re¬ 
cently, celebrated works of Mossel et al. [20, 21] and Massoulie [18] considered 
balanced two-community sparse networks, and discovered the threshold phe¬ 
nomenon for both weak and strong consistency of community detection. Fur- 
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ther extensions to slowly growing number of communities have been made 
in [14, 22, 8, 1]. Recently in statistical literature, theoretical properties of 
various methods had been investigated as well in [8, 31, 5, 9, 25, 16], usually 
under weaker conditions and better suited for real data applications, but 
the convergence rates may often be sub-optimal. 

Despite recent active and significant developments in network analysis, as¬ 
sumptions and conclusions can be very different in different papers. There is 
not an integrated framework on optimal community detection. In this paper, 
we attempt to give a fundamental and unified understanding of the commu¬ 
nity detection problem for the Stochastic Block Model (SBM). Our frame¬ 
work is quite general, including homogeneous and inhomogeneous SBMs, 
dense and sparse networks, equal and non-equal community sizes, and finite 
and growing number of communities. For example, the connection proba¬ 
bility can be as small as an order of 1/n, or as large as a constant order, 
and the total number of communities can be as large as n/logn. Under this 
framework, a sharp minimax result is obtained with an exponential rate. 
This result gives a clear and smooth transition from weak consistency (par¬ 
tial recovery) to strong consistency (exact recovery), i.e., clustering error 
rates from o(l) to o{n~^). As a consequence, we obtain phase transitions 
for non-consistency and strong consistency, under various settings, which 
recover the tight thresholds for phase transition in [20, 21, 22, 8]. 

The Stochastic Block Model, proposed by [15], is possibly the most studied 
model in community detection [6, 25, 16]. Consider an undirected network 
with totally n nodes, and K communities labeled as{l,2...,iF}. Each node 
is assigned to one community. Denote a to be an assignment, and a{i) is 
the community assignment for the i-th node. Let Uk = \{i ■ cr{i) = k}\ be 
the size of the A;-th community, for each k G {1,2,..., K}. We observe the 
connectivity of the network, which is encoded into the adjacency matrix 
{Ajj} taking values in {0, If there exists a connection between two 

nodes, Aij is equal to 1, and 0 otherwise. We assume each Aij for any i > j 
to be an independent Bernoulli random variable with a success probability 
9ij. Let Ai^i = 0 (no self-loop) and Aij = Aj^i (symmetry) for any i,j. 
In the SBM, {9ij} is assumed to have a blockwise structure, in the sense 
that 9ij = 9iij' if i and i' are from the same community, and so are j and 
j'. We require that the within-community probabilities are larger than the 
between-communities probabilities, as in reality individuals from the same 
community are often more likely to be connected. 
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We consider a general SBM with parameter space defined as follows, 

0(re, K, a, b, /3) = 

Oi,j > if ^(0 = ^(i) and 9 ij < ^ if 7 ^ ^0’)) = 0) Vi 7^ j|, 

where /3 > 1 and is bounded. When /3 = l+o(l), all communities have almost 
the same size. The parameters a/n and b/n have straightforward interpre¬ 
tation, with the former one as the smallest within-community probability 
and the later as the largest between-community probability. Throughout the 
paper, we assume e < b < a and ajn < 1 — e for a small constant e > 0, 
allowing the network to be from very sparse to very dense. 

We use the mis-match ratio r{a, a) to measure the performance of com¬ 
munity detection. It is the proportion of nodes mis-clustered by a against 
the truth a. The exact definition is given in Section 2.1. The minimax rate 
for the parameter space 0(n, K, a, b, /3) in terms of the mis-match ratio loss 
is as follows. 


(o-, : a :[n]^ [K]'^,nk G 


n f3n 
Jk' ~K 


,VfcG[iC], {0ij}€[0,lY 


Theorem 1.1. Assume ^ 00, then 


( 1 . 1 ) 


irif sup Er((T, a) 


exp ( - {I + o{l))'!^), K = 2, 
exp ( - + > 3, 


where 1 < (3 < i/5/3. In addition, ifnI/K = 0(1), there are at least a con¬ 
stant proportion of nodes mis-clustered, i.e., inf^-sup0(„^^ b,/?) Er(iT, d) > c, 
for some fixed constant c > 0. 


Note that when K is finite, n/ —>■ 00 is a sufficient condition to get Equation 
(1.1) since it is equivalent to ^ —)■ 00 . Here the key quantity I is defined 

as 



which is exactly Zli/ 2 (Ber(^)||Ber(V)), the Renyi divergence of order 1/2 
between two Bernoulli distributions Ber(^) and Ber(V). The form of I is 
closely related to the Hellinger distance between those two Bernoulli proba¬ 
bility measures. It is worth pointing out that I is equal to (a — 6)^/(an), up 
to a constant factor, which can be interpreted as the signal-to-noise ratio. 
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as long as a/n < 1 — e for some e > 0. In particular, when a = o{n), I is 
equal to (1 + o{l)){^/a — Vb)‘^fn. 

The lower bound of (1.1) is achieved by a novel reduction of the global min¬ 
imax rate into a local testing problem. A range of new penalized likelihood- 
type methods are proposed for obtaining the upper bound. These ideas in¬ 
spired the follow-up paper [ 11 ] to develop polynomial-time and rate-optimal 
algorithms. 

Theorem 1.1 covers both dense and sparse networks. It holds for a wide 
range of possible values of a and b, from a constant order to an order of n. 
It implies that when the connectivity probability ^ is 0{n~^), no consis¬ 
tent algorithm exists for community detection. The number of communities 
K is allowed to grow fast. It can be as large as in the order of n/logn 
when the connectivity probability is a constant order, in which each com¬ 
munity contains an order of log re nodes. In addition, for finite number of 

communities, Theorem 1.1 shows ->■ oo is a necessary and sufficient 

condition for consistent community detection, which implies consistency re¬ 
sults in [20, 21]. It also recovers the strong consistency results in [22, 14], in 
which they additionally assume a x log re. 

The minimax rate is of an exponential form, contrast to the polynomial 
rates in [25, 16]. The term ^ plays a dominating role in determining the 
rate. Consider the /3 = 1 case. Rewrite ^ in the form of p log re, and then 
approximately we fail to recover essentially re^“^ nodes. When p > 1, the net¬ 
work enjoys strong consistency property (exact recovery) since = o(l), 
i.e., every node is correctly clustered. While for 0 < p < 1, it is impossible 
to recover the communities exactly. 

Organization. The paper is organized as follows. The fundamental limits 
of community detection are discussed in Section 2. We present the penalized 
likelihood-type procedures in Section 3 to achieve the optimal rate. Some 
special cases of our result and the computational feasibility are discussed in 
Section 4. Section 5 gives the proofs of the main theorems, while Section 6 
provides the proofs of key technical lemmas. 

Notation. For any set B, we use \B\ to indicate its cardinality. For two 
arbitrary equal-length vectors x = {xi} and y = {yi}, define the Hamming 
distance between x and y as dnix^y) = |{i : xi 7 ^ yi}\-, i.e., the number of 
coordinates with different values. For any positive integer rre, we use [rre] to 
denote the set {1, 2,..., rre}. For any two random variables X and Y, we use 
A T T to indicate that they are independent. Denote Ber(g) as a Bernoulli 
distribution with success probability q, and Bin(rre, q) as a binomial distri- 
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bution with m trials and success probability q. For two positive sequences 
Xn and Vn, Xn ^ Vn means Xn < cy„ for some constant c not depending on 
n. We adopt the notation x y if Xn ^ Vn and yn < Xn- For any scalar z, 
let [z] = {m £ Z : m < z} and = {m £ Z : m > z}. We use 0 short for 
0(re, K, a, b, j3) when there is no ambiguity to drop the index (n, K, a, b, f3). 

2. Ftindamental Limits of Community Detection. 

2.1. Mis-match Ratio. Before giving the exact definition of mis-match 
ratio, we need to introduce permutations A : [K] —)• [K] to define equivalent 
partitions. For the community detection problem, there exists an identifiabil- 
ity issue involved with the community label. For instance, for a network with 
4 nodes, assignments (1,1, 2, 2) and (2, 2,1,1) give the same network parti¬ 
tion. Define 5 o u as <5 G A to be a new assignment with (5 o a){i) = 5{a{i)) 
for each i £ [n]. This assignment is equivalent to a. The mis-match ratio 
is used as the loss function, counting the proportion of nodes incorrectly 
clustered, minimizing over all the possible permutations as follows, 

r(cj, (t) = inf dnicr, 6 o d)/n. 

s 

The Hamming distance between a and d is just to count the number of en¬ 
tries having different values in two vectors. Thus r(cj, d) is the total number 
of errors divided by the total number of nodes. 

2.2. Homogeneous Stochastic Block Model. The Stochastic Block Model 
assumes the network has an underlying blockwise structure. When all 
take two possible values a/n or 6/n, depending on whether a{i) = a{j) or 
not, we call the SBM homogeneous. In this case is unique for any given 
a. The homogeneous SBM is the most studied model in computer science 
and probability [20, 21, 22, 14, 8]. Define 

Qi{n,K,a,b,l3) = |(cr, E e{n, K,a,b, jd) : dij = ^ if (^{i) = (x{j) 

and dij = ^ if o-{i) / a{j), Vf / j|. 

This is a homogeneous SBM. In 0i, since {dtj} is uniquely determined by 
any given a, we may write cr E 0i instead of {a, E 0i for simplicity. 

The same rule may be applied for any other homogeneous SBM. 

Note that 0i is closed under permutation. Let vr be any permutation on 
[n], then for any a £ 0i, a new assignment a' defined as cr'{i) = a{7r~^{i)) 
also belongs to 0i. This property is very helpful for us to show 0i is a least 
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favorable subspace of 0 for community detection. A minimax lower bound 
over 01 immediately gives a lower bound for a larger parameter space, such 
as 0 . 

2.3. From Global to Local. To establish a lower bound is challenging to 


work with the loss function r{a, a) directly, as it takes infimum over an 


equivalent class. The mis-match ratio is a global property of the network. 
The key idea in this paper is to define a local loss, and to reduce the global 
minimax problem into a local classification for one node. 

The local loss focuses only on one node. Given the truth a and any proce¬ 
dure a, the loss of estimating the label for the i-th node is defined as follows. 
Let S'o-(it) = {a' : a' = 5 o(T, <5 G A, duify', u) = inf^ (Jod)}, and define 



for each i G [n]. It is an average over all the possible a' G 5'o-(d). 

We will see later that it is relatively easy to study the local loss. Lemma 
2.1 shows that the global loss is equal to the local one when the SBM is 
homogeneous and closed under permutation. 

Lemma 2.1 (Global to local). Let A be any homogeneous parameter 
space that is closed under permutation. Let r be the uniform prior over all the 
elements in A. Define the global Bayesian risk as Br{a) = Yhu&k IEr(iT, d) 

and the local Bayesian risk BT-{d'{l)) = X^o-eA '^(1)) 

node. Then 


iiif Br{d) = mf i3T-(<5'(l)). 


a 


The proof of Lemma 2.1 is involved. It is established by exploiting the 
property of exchangeability of the parameter space A. 

2.4. Minimax Lower Bound. By constructing a least favorable case of 
01 , we have the following lower bound for the minimax rate. We present the 
lower bound under milder conditions than what is stated in Theorem 1.1. 

Theorem 2.1. Under the assumption ^ — )> oo, we have 


( 2 . 1 ) 



If ^ = 0(1); then inf^sup 0 ^(„ ,A,a, 6 ,/ 3 ) ®'^(t d) > c for some positive con¬ 
stant c. 
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The forms of minimax rates are different for two cases K > 3 and K = 2. 

For K > 3, it is relatively more challenging to discover and distinguish 
small communities, rather than the communities with larger sizes. The least 
favorable case is the case for which at least a constant proportion of com¬ 
munities are of size The hardness of the community detection in this 
setting is then determined by the ability to recover and distinguish such 
small communities. For K = 2, the least favorable setting in 0i is when 
the two communities are of the same size. When there are only two com¬ 
munities, it is actually easier to recover the non-equal-sized communities, 
by identifying the larger one first and then labeling the remaining nodes as 
from the smaller one. 

Approximately Equal-Sized Case: We are interested in the case with 
P = 1 + o(l), where communities are almost of the same size. Networks of 
community sizes exactly equal to ^ are the most studied settings [ 8 , 21, 9]. 
Here we allow a small fluctuation of community sizes. Denote 0*^ as follows, 

{ 77 T 

{cr,{6ij}) : cr : [n] [K]"',nk = {1 + o{l)) —,'ik £ [K], 

9i^i = 0, Vi £ [n], 9ij = ^ if cr{i) = ^{j) and 9ij = ^ if < 7 (i) / cr(j), Vi 7 ^ j 

Note that 0^(n, K, a, b) is 0i(n, K, a, 6, /?) with /3 = 1 -|- o(l), for which we 
have the following minimax lower bound. 

Theorem 2.2. Under the assumption ^ —)• 00 , we have 


( 2 . 2 ) 


inf sup Er((T, cj) > exp f — (1 -|- 0 ( 1 ))-—') 

^ e°(n,K,a,b) ^ 


If = 0(1), then infg- supQO(^n,K,a,b) IEr(cr, a) > c for some positive constant 

c. 

Compared with Theorem 2.1, the forms of rates for K = 2 and K >3 are 
the same in 0*^. The proof of Theorem 2.2 is provided in Section 5. We defer 
the proof of Theorem 2.1 to the supplement material [30], since it is almost 
identical to that of Theorem 2.2. 


3. Rate-optimal Procedure. We develop a range of penalized likelihood- 
type procedures to achieve the optimal mis-match ratio. Throughout the 
section uq is denoted as the underlying truth. 
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3.1. Penalized Likelihood-type Estimation. The penalized procedure is 
based on the likelihood of a homogeneous network, although risk upper 
bounds are established for more general networks. If the network is homoge¬ 
neous (0*^ and 01 ), for which the within and between community probabili¬ 
ties are exactly equal to a/n and b/n respectively, the log-likelihood function 
is 

L{a] A) = log( —) ~ ~) ~ 

i<j i<j 

^ + log(l — —) ^(1 “ ^i,j)l{o-(j)^o-(j)}- 

i<j i<j 

Since Y2i<j ^i,j^{o-ii)=a-(j)} + ^i,j^{cr{i)f^a{j)} — 'YliKj ^i,j ^^r all IT, We 

can write L{a', A) as 

1 all — 6/n) ^ 1 — 6/n ^ 

L{a; A) — log ^ ^ ^ Tijl|o.(j)=o.Q)} — log ^ ^ ^ i{a(i)=a(j)} + f(^), 

^ ' ’ i<j ' i<j 

where f{A) is a function not depending on a. Then the maximum likelihood 
estimator is as follows, 


(3.1) 

^MLE 


arg maxL(iT; A) 

a 

a(l — b/n) . 1 — b/n 

Ki - °/n) ^ “ ® ^ 

Kj Kj 


The above maximum likelihood estimator can be decomposed into two terms. 
The first one is the sum of all Aij for all i and j belonging to the same 
communities of a. The second term is a penalty over the sum of sizes of 
all communities. There is a trade-off between these two terms. The first 
term is maximized when there is only one community, while the second 
term, a penalty term, is maximized when all community sizes are equal. 
However the second term is dropped when the community sizes are re¬ 
quired to be exactly equal, i.e., the maximum likelihood estimator over all 
a with a community size n/K for every community has a simpler form, 
^ argmax^ J2i<j 

When the parameter space is not homogeneous (e.g. 0), the maximum 
likelihood estimator may not have a simple form as Equation (3.1). However, 
we still propose to use the identical simple form of penalized likelihood 
estimator as Equation (3.1), i.e.. 


a = arg max 

(tG0 


r(cj) with T{a) = ^ 

i<j 


i<j 
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where we set 
(3.2) A 


log 


/I — b(n 
\1 — a/n 



/ a{l-b/n) \ 

U(1 - a/n)/’ 


yK > 2 . 


When the parameter space is homogeneous, a is identical to the maximum 
likelihood estimator. The optimality result will be obtained for the parame¬ 
ter space 0, which allows the network to be inhomogeneous, and imbalanced 
in the sense that the community sizes may be different. 


3.2. Other Choices of In the previous section we provide a unified A 
for the penalized likelihood-type estimation for both K = 2 and iC > 3. It 
is worthwhile to point out that for K > 3 the optimality can be attained 
for a wide range of A. Let t* = ^ log ■ It can be shown that t* is 

the minimizer of the moment generating function for the difference of two 
Bernoulli variables, i.e., t* = arg min^^Q where X Ber(^) and 

Y ~ Ber()(). It is equivalent to write A in Equation (3.2) as follows. 


A =-log 

2t* ^ 

1 


E exp(-t* 


-k 1 - - 

' n 


kexp(t*) + l-^ 


2t* 


= “ATf log “ exp(-t*) -k I - - -k — log - exp(t*) -k 1 - - 


1 


2C 


n 


n 


.n n/ 

From the equation above, we can interpret A as a weighted sum between two 
terms, with the first one more involving the within-community probability 
and the second more focusing on the between-community probability 
Define 


(3.3) A = 


'-lFlog(^“'*+l-5) + 5Fl°g{h'*+l-5). K = 2 

-F log (Sg-'' +1 - h + log (h''+1 - b *' > 3. 


where w in any constant in [0,1]. We can clearly see that A in Equation (3.2) 
is a special case of A in (3.3) with ui = 1/2. In Section 3.3, we give theoretical 
properties of penalized likelihood estimation for all A in Equation (3.3). 


3.3. Minimax Upper Bound. For the general SBM 0, the risk upper 
bound of the penalized likelihood estimator, for every A in Equation (3.3), 
defined in the previous section, matches the minimax lower bound given in 
Theorem 2.1. 


Theorem 3.1. Assume —>■ oo and K > 2. For the penalized 

maximum likelihood estimator a with A defined in (3.3), we have 


sup Er((T, a) < 

©(n,iC,a,i»,j3) 


exp ( - (1 -ko(l))^),E: = 2, 
exp ( - {l + o{l))^),K > 3, 
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where 1 < /3 < y^5/3. 

Approximately Equal-Sized Case: For the special parameter space 0^ 
for which community sizes are almost equal, we have the following result, a 
form analogous to Theorem 3.1. 

Theorem 3.2. Assume oo and K > 2. For the penalized 

maximum likelihood estimator a with A defined in (3.3), we have 

Ttl 

sup Er((T,cr) < exp ( - (1 + o(l))—). 

S^{n,K^a,b) ^ 

The proof of the above theorem is provided in Section 5. Due to the simi¬ 
larity, the proof of Theorem 3.1 is given in the supplement material [30]. 

4. Discussion. 


4.1. Implications on Sharp Thresholds. The minimax rates in Theorem 
1.1 immediately imply various sharp thresholds in [20, 21, 22, 14]. By letting 
the rates equal to o(l/n) or o(l), we can get critical values for strong and 
weak consistency respectively, under various settings. 

Special Case with a = o{n) and = o(l). Under this scenario the 
difference of within-community probability and between-community prob¬ 
ability is relatively small. Note that / = (1 -|- o(l))(a — 6)^/(4an), which 
reduces the minimax result into the form of exp(—(1-|-o(l))(a —6)^/(4aA)). 
In the case of A = 2, Theorem 1.1 implies the results from [20, 21]. With the 
additional assumption a,b = they show that {a — b)‘^/a —)• oo is 

the necessary and sufficient condition to get consistency. It also agrees with 
the sharp threshold for strong consistency in [ 22 ]. 


Special Case with Probability in the Order of log n. Consider a more 
special setting where a and b are in the order of log n. Denote a = ei log n 
and 6 = 62 log n, with ei > 62 > 0. Note that I can be written as / = 
(1 o(l))(^ - logn/n. 

Corollary 4.1. Assume K = There exists a strongly consistent 

estimator if lim inf„_>.oo ^ ■ 


For any finite K, the recovery threshold is identical to the result in [14]. 
For the two-community case with ei and 62 constants, yTi — ^/ef > \p2 for 
exact recovery is proved in [ 22 ]. 
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4.2. Computational Feasibility. The penalized likelihood estimator we 
propose searches all the possible assignments in the parameter space. It is 
computationally intractable due to the enormous cardinality of the assign¬ 
ments. However, the idea of reducing global estimation into local testing 
problem we developed in this paper establishes a guideline for constructing 
both efficient and optimal algorithms. Along with the global to local scheme, 
the penalized likelihood estimator can be further modified into an node-wise 
procedure, whose purpose is to assign the label node by node. In this way 
the exhaustive search over the parameter space is avoided and the computa¬ 
tion complexity is dramatically reduced. By exploiting the local idea, in the 
subsequent paper [11] a two-stage algorithm is proposed to simultaneously 
achieve the optimal rate and computational feasibility. 

5. Proofs of Main Theorems. In this section, we prove two main 
theorems. Theorem 2.2 and Theorem 3.2. The proofs of Theorem 2.1 and 
Theorem 3.1 are almost identical to those of Theorem 2.2 and Theorem 3.2. 
We put them in the supplement material [30]. 

5.1. Proof of Theorem 2.2. To get the lower bound for the parameter 
space 00, we will first construct and analyze a least favorable case in term 
of the sizes of the communities. In particular the community sizes only take 
value in {, [^J -|- 1, [^J — 1}, and the number of communities with size 
or [^J -|- 1 is of a constant proportion of K. 

First consider the case with K > 3. For each pair of (n, K), the integer K 
can always be decomposed as the sum of three integers: K = Ki + K 2 + K 3 , 
satisfying (1) there exists a constant e > 0 such that eK < m.m.{Ki,K 2 ) < 
max(Aii, 1 ^ 2 ) < (1 — e)K; and (2) either of the following two conditions: 


71 71 71 

+ (L^J + 1 )K 2 + (L-J - l)Ks = n; 

71 71 71 

or + ([^1 + 1 )K 2 + ([-1 - l)Ks = n; 


(5.1) 

(5.2) 


When AT > 3, it can be shown that such decomposition always exists. Write 
n = [^\K -|- r, where 0<r<iF — lisan integer. If r > 2eK and 
r < (1 — 2e)K for a constant e > 0, we have n = {K — r) + ([^J -|- l)r, 
which satisfies Equation (5.1). Otherwise, if r < 2eK for a small positive 


constant e, write n = [f J (A:-2Lf J-r)-k(Lf J+l)(Lf J+r)-h(Lf J-1) [f J, 


which satisfies Equation (5.1) for e sufficient small. If A' — r > 2eAr, we may 
argue similarly to get Equation (5.2). 

Recall that we use to denote the size of the A:-th community for each 
k G [AT]. Without loss of generality, assume there exist {Arj}i<i <3 satisfying 
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Equation (5.1) with eK < Ta.m.{Ki,K 2 ) < max(ii'i, 7^2) < (1 — e)K. Define 
a subparameter space of 0° as follows, 

Q^{n,K,a,b,{Ki}) = |(cr,{6'ij}) G Q^{n,K,a,b) : |{A: : = L^J}| = 

\{k : Uk = L^J + 1}| = K2,\{k : Uk = [^\ - 1}| = ^^3}. 

For the case with K = 2, we can define the least favorable case in an 
analogous way. It has a slight different form depending on whether n/2 is 
an integer or not. If ^ 0^(n, 2,a, 6) = {{<7,{0i,j}) G Q^{n,2,a,b) : 

(ni,n2) = ([fj, ni)}- Otherwise, &^{n,2,a,b) = {(cr,{6'ij}) G 0°(n,2,a,6) : 
(ni,n2 )G{(f,f),(f+ l,f-l)}}. 

Note that 0^ is homogeneous and closed under permutation. Compared 
with 0*^, 0-^ is quite small, enough for us to do some lower bound analysis. 
On the other hand, it is large enough to match the lower bound in Equation 
( 2 . 1 ). 


Lemma 5.1. Let r be the uniform prior over all the elements in 0^. For 
the first node, define the local Bayesian risk to heBr{cF{l)) = j^ry Z^o-ee^ IEr(cr(l), it(1)). 
Then there exists a constant e > 0 such that 

ln/K\ [n/K\ 

Br{a{l))>eF(^ Y, Xu > 

U=1 U=1 


where Xi ~ Ber{k), Yi ~ Ber{^), for i 
I — I 

{ym- 


1,2 ,..., L^J, and 


_L 


Lemma 5.1 shows the lower bound is only involved with 2[^J Bernoulli 
random variables, whose success probability is either a/n or b/n. Recall 
that a/n is the smallest within-community probability and b/n is the largest 
between-community probability. The lower bound here will be determined 
by testing two probability measures. In 0^, the most difficult case is testing 
two assignment vectors with Hamming distance 1. The difference of their 
probability measures is exactly the difference between probability measures 
of X and T. 


Lemma 5.2. Letn' = [f J • Define Zi = Xi-Yi with {Xi} ~ Her(^), {Yt} ~ 
Ber{^), and {Xi} T {Yi}, for z = 1, 2,..., n'. ^ 00 , we have 


P 



>0 


2 = 1 


> exp (—(1 + o(l))n//iL). 
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In addition, ifnI/K = 0(1), then IP Yl'i=i Zi > 0^ > c for some positive 
constant c > 0 . 

Lemma 5.2 provides an explicit expression for the lower bound. The proof 
mainly follows the proof of Cramer-Chernoff Theorem [27]. The general 
Cramer-Chernoff Theorem gives a lower bound for the tail probability that 
the sum of random variables deviates from its mean. Usually it is for the 
case where these random variables are from a distribution independent of 
the sample size. In our setting we allow a and b to depend on n'. 

Proof of Theorem 2.2. Since 0^ C 0°, wehaveinf 5 -sup 0 oEr(cT, d) > 
inf^ supo.g 0 L Er(iT, a). Due to the fact that Bayes risk always lower bounds 
the global risk we have inf^ supo.g 0 L Er(cj, a) > infa- supo.g 0 i Br{a'). By the 
fact that 0 ^ is a homogeneous parameter space closed under permutation 
for both K > 3 and K = 2, Lemma 2.1 implies infa-supg.g 0 L = 

infasupo-g 0 i Bt-((t( 1)). Thus 

inf supEr((T, cj) > inf sup Br{d'{l)), 

<3-00 <3- ^g 0 L 

which, together with Lemma 5.1 and Lemma 5.2, implies Equation (2.2) of 
Theorem 2.2. □ 

5.2. Proof of Theorem 3.2. Recall that A is the set of all permutations 
from \K] to [K]. For an arbitrary a G 0^, define r(cj) as the equivalent class 
of a with r(fT) = {a' : 35 G A, s.t. a' = 5 o a}. We use the notation T as 
a general reference for equivalent class, and {T} as the set consisting of all 
the possible equivalent classes with respect to 0°. For any cJi, <72 G 0, define 
the distance between ui and cr 2 as 

d{cri,a 2 )= inf (ij|^(cri, cr^) = inf duicr’i^a^). 

0'2er(cr2) crler(cri),(T'er(cr2) 

Here we view d{-,-) as a distance between the equivalent class r(cri) and 
r(cr 2 ). Accordingly the mis-match ratio r(cj, d) is exactly equal to 

r(cr, d) = —d(a, d). 
n 

In the following sections we denote the true assignment by do. Define 

(5.3) Pm = E(3(T G 0° : d(do, a) = m and T{a) > T(cJo)) 

for any integer m with 0 < m < n. The key step is to get a tight bound of the 
probability E(T(cj) > T(cro)) for one fixed assignment a satisfying d{a, (Tq) = 
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m. Let {nfc} to be the size of communities under the truth (Tq. Without loss 
of generality, assume cTo(i) = k for any i G [J2j<k-i '^j<k Then 

the value of 2 Yli<j ^i,jWoi'^) = <^ 00 )} is just to add up all the entries in the 
K diagonal blocks of the adjacency matrix A. It is illustrated by color plates 
in the Figure 1. The gray parts represent the within-community connections, 
and blank parts represent the between-community connections. It is obvious 
to see that — ^o(j)} precisely includes all the gray parts, 

i.e., all the Bernoulli random variables with success probability ^ in the 
adjacency matrix. 









Fig 1. Each gray block stands for all the within-community connnection in one single 
community. The areas inside the squares are all the Aij entries summed up. Left: For 
2= cro{j)}, the squares exactly overlap with the gray regions. Right: For 
2 = cr(j)}, there would be some differences between the squares and gray 

parts, which are labeled as a or ^ according to their relative positions. 

When dH{cr,ao) = d{a,ao) = m, by comparing the two color plates in 
Figure 1, we can clearly see where the difference = uo(i)} — 

= ^(i)} iiss in. Note that 

^ ^ ~ ^i,j^icr(i)=aij)}^iao(i)f^<7o(j)} 

i<j i<j i<j 

i<j 


Define a(cr;ao) = |{(i,j) ■ i < j, <Jo{i) = (To{j) and cr(z) / o'(j)}|, and 
7 ((T;cro) = \{{i,j) ■ i < j, coit) 7 ^ no(j) and cr(i) = o-(j)}|. We use the 
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notations o. and 7 for short when there is no ambiguity, then 

^ l{o-o(*)=o'o(i)} = OL — 1- 

i<j i<j 

The following proposition is helpful to study Pm defined in Equation (5.3). 

Proposition 5.1. Let a G be an arbitrary assignment satisfying 
d{cr, (To) = rn, where 0 < m < n is a positive integer. Then 


Ber{-),Yr^ Ber{- 
n n 


p(r(u)>r(ao))<p( E^.-E El > A(7 - a) 

i=l i=l 

< exp (—(a A 7 )/), 

for A defined in Equation (3.3). 


Note that the value of 7 depends on a and (Tq. Lemma 5.3 provides a 
lower bound on 7 for each m. 

Lemma 5.3. Let a G be an arbitrary assignment satisfying d{a, (Tq) = 
m, where 0 < m < n is a positive integer. Then 

( (l—r})nm 


a{a-,ao) Aj{(T;ao) > < 2(1- 


^ - m% ifm<f^, 

r})nm 


t 9 K ) > 2 A • 

Lemma 5.3, together with Proposition 5.1, immediately implies an upper 
bound on P(T((t) > T((To)) for each given a. 

Lemma 5.4. Let a G be an arbitrary assignment satisfying d{a, ao) = 
m, where 0 < m < n is a positive integer. There exists a positive sequence 
r/ —)■ 0 , independent of the choice of a, such that 

¥(T(a)>T(a 1) < / +mH), ifm<^, 


for A defined in Equation (3.3). 


We will apply a union bound to get an upper bound for Pm. It is worth¬ 
while to point out that, in the union bound we should not use the cardinality 
of {(T G 0 ^ : d{a,ao) = m}, which is too large due to counting the assign¬ 
ments from the same equivalent class repetitively. Proposition 5.2 gives an 
upper bound for cardinality of the equivalent class {P}. 
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Proposition 5.2. The cardinality of equivalent class that has distance 
m from uq is upper bounded as follows, 


|r : 3cr e r s.t. d{a, fio) = m| 
where 0 < m < n is a positive integer. 


. f ^ enK ^ m 
< min • 




With Proposition 5.2 and the union bound we are able to get a satisfactory 
bound by 


P < 

m. ^ 


|r : 3cr G r s.t. d{a,ao) = m\ max F(T{a) > r((To)). 

^ J {a: d{cr,ao)=m} 


Proof of Theorem 3.2. We only prove the case with iU —>■ oo and 
t oo. Let r/ —)• 0 be a universal positive sequence given in Lemma 
5.4. We consider three scenarios as follows. 


(1) If liminf^^oo x^iogn ^ there exists a small constant e > 0 such 
A~iogn^ > 1 + e. Let T] decay slowly such that both and ^ 

go to infinity. We have Pi < niU exp ( — where R = 

nexp ( — (1 — 2 rj)nl/KY Since 


nEr(cr, <t) < Pi + ^ mPm, 

m=2 

it is sufficient to show Y17=2 is negligible compared with R. For m G 
[ 2 ,m'], where m' = ^, we have 

,enK , (1 - r])nl ..m 

Pm < (^- exp (- — -+ ml)) 

^ ,enK , {l — n)nl ^,,,enK , (1—r?)n/ , ,,^-1 

< (—exp(- ^ +^j))(_exp(- ^ ^ Pm!l)) 

< nexp ( — exp(m/)n“'^^™“^^/^ 

where we use the fact that / < 1 in the fourth inequality to show < 1 

when n is large enough. As a consequence, Y1Y=2 = o(P), as {nT,Pm }^2 

is dominated by a fast-decay geometric series. 
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For m G [m\ n], we have 
,enK , 

Pm < (—(-—)) 

\ 77),' ' gK >> 

i {1 - 2r])nP ,enK , 2(1 - r/)n/s ^m-g 

< n exp-—— exp- 

P w! ^ 9K P 

Since m' —)• oo, {mPm}m>m' is dominated by a fast-decay geometric series, 
which leads to J27>m' ''^Pm = o{R). 

(2) If limsup„_,.oc K\ozn ^ there exists a small constant e > 0 such that 
^A~iogn^ < 1 — e. Let mo = n exp (— (1 — ^ which satisfies both 

mo > {nKyP and mo = o{-^). We are going to show that {Pm}m>mo is 
upper bounded by a fast decaying series {Qm}m>mo- 
For any m G [mo, m'], where m! = we have 

,,enK. , (1 -? 7 )n/ , ..m 

Pm < ((- )exp(---hm/j) 

< (exp (log(niL) -h ((1 - - (1 - 2 iC~^)) v)nI -^-^m 

, mil — 'n)nP 


2iC"/2 


iC 


which is denoted as Qm- Since > log n, we have YlZ=mo - Yjm=mo 
in'Qmo ^ exp (logn - = o(^). For m' < m, we have 

,enK I 2{\ - r])nl..m ^ , nmP 

)) Sexp(-^). 

Denote Qm = exp ( — which decays geometrically fast, as ^ —)• oo. 

Thus Em=m' Pm < Em=m' Qm ^ “^Qm’ =o{^). Consequently, 

Er(cj, (To) < —^ -|- P(3(T G 0° : d{ao, u) > mo & ^(ct) > l{cro)) 

m' n 

+ Pm' + 


< 


n 

mo 


n 


< —-h m Qmg -|- 2Qm' 

(1 — o(l))n/ 


m>mo 


m>m' 


n 

= exp 


iC 


)• 
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(^) K^ogn = 1 + o(l), there exists a positive sequence re —>• 0 such that 


I {l—ri)nl 


— 1| <C rc, — < w and 

I ' • / r\cr 'ri - 


wnml 
A log A 


OO 


. Define mo = n exp ( — 


(1 — Thus mo > —)■ oo, and mo = o(m') for m! = uP'n/K. 

We are going to find a fast decay series {Qm} to upper bound {Pm}- For 
m G [mo, m'], 

,,enK. , (1 -r/)n/ , ..m 

Pm < ((-) exp (- - -hm /)) 

^ mo K 


K 


K 


+ 


K 


-y 


, ojil — r])nmP 
<exp(-^), 


which is denoted as Qm- Note that ojuiq > —>• oo. We have Qmo < 1; 

and furthermore 


Pm < Qm < rn'Qmo < exp (logn - v)nl ^ ^ 

m=mo m=mo 

For m G [m',n], we have 

,enK , 2{\-r])nl..m ^ , nmP 

P„<(^exp(- )) <exp(-^), 

Let Qm = exp ( — ^|^), which decays geometrically fast. Then Ylm=m' Fm < 
Em=m' Qm < 2Qm' = o(^)- Hence 


Er((T, do) < — + 'Y] Pm' + 'Y Pm < exp ( - 

n ^^ ^^ 


m>mo 


m>m' 


(1 — o(l))n/ 
K 


)• 


When iU is a fixed constant, the proof is nearly identical but with different 
m' under each scenario. The proof is thus omitted. □ 


6. Proofs of Auxiliary Lemmas. We prove Lemma 2.1, Lemma 5.1, 
Lemma 5.2, Lemma 5.3, Proposition 5.1 and Propostion 5.2 respectively in 
this section. 


6.1. Proof of Lemma 2.1. Before going directly into the proof we define 
another network operator: (element-wise) permutation. Let vr : [1, 2,..., n] —)• 
[1,2,..., n] be a permutation. Denote H to be the set consisting of all such 
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permutations, whose cardinality is n\. Define Ctt to be a new assignment 
with 


iT 7 r(i) = cr(7r ^(i)),Vl < i < n. 

It is obvious that for an arbitrary assignment a € A, each of its permutation 
(Tn is also in the parameter space A. 

On the other hand, a permutation on the nodes leads to the change of 
the network. For a network G with an adjacency matrix A, define Gjr as the 
network after permutation with a new adjacency matrix A.^, where 

(^7r)i,j (i),7r“l (j) • 

Note that can be seen as a network sampled from the assignment 
since (A„)ij ~ Ber((9^-i(j)_^-i(j)). 

The proof of Lemma 2.1 is mainly by exploring the exchangeability of 
the network. Any estimator ct is a mapping from a network to a length 
n vector. We use the square brackets a[G] to indicate that the outcome 
of a is implemented on the network G. And d'[G]{i) is the value of the i- 
th component of d'[G], and when the meaning is clear, we write a{i) for 
simplicity. 

Based on ft, we can always design a new (unless they are the same) pro¬ 
cedure by permutation. Given a network G, we can either directly apply a 
(to be more precise, it is d[G]), or first permute the network into then 
implement a on it to get (t[Gjr], and then finally “permute back” to get the 
estimation in the original order. To be more precise, define procedure as 

a'^[G]{i) = a[G^]{'K{i)). 

We use the notation short for a'^[G]{i). See Figure 2 for the illustration 
on getting cT^. 

Intuitively, due to the exchangeability of G, if a is optimal, it should have 
the same risk as for any possible vr. With this trick we are able to show 
the existence of a universal procedure a which has the equal global risk for 
all fj G A and the equal local risk for all i € [n]. Then the proof is completed 
by the fact that the minimax risk is lower bounded by the Bayes risk. 

Proof of Lemma 2.1. Denote the network to be G. Assume a be one 
of the estimators that achieve the global Bayes risk, i.e., = inf 5 - Br{d). 

Based on d, we can define a randomized procedure d as P(d = d^) = l/|n|, 
for each vr G 11. We will show d is also a global Bayes estimator in terms of 
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Fig 2. Illustration on getting based on the original network G. All of o[G], (T[G,r] 
and are demonstrated as n-by-1 vectors. It shows Aij becomes (A,r) 7 ri, 7 rj after the 

permutation tt of the network. For any specific node i in G, its location is changed into 
7 r(i) in G^. The procedure (T[G-n-] estimates the assignemnt of the permuted nodes {71(1)}, 
while a'" [G] estimates the assignment of the original nodes. 


T. For an arbitrary cr G A, we have 

Er(cr, a) = — Er(cj, 

^en 

Recall that Er(iT, = Einfo-/gr(5.7r) ex'). There exists a one-by-one 

relation between r(a'^) and r(a'[Gjr]), in a the sense that, for any a' from the 
former set, there is a” in the latter set such that a''{i) = (T'(7r“^(i)), Vx G [n], 
and the reverse also holds. We have the following equation (we add subscript 
a to explicitly indicate that the expectation is taken with respect to the 
assignment ct). 


1 

Er(cj,d’^) = -E^ inf ^ l{a{i) / a'{i)} 
n o-'erfo-’" ^ 

1=1 


1 

= -E,, inf V 
n o-"er(a[G,r]) ^ 


l{fT^(7r(i)) / o-"(7r(i))} 


1 

= -E^ inf V 
n a"er(a[G,r]) ^ 
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Thus a also achieves the minimum Bayes risk. We will show = 

for any i,j G [n]. It is equivalent to define a as 

= |^,Vi G [n], 

which implies 

Er((j(f),a(f)) = 7 ^ J]]Er(o-(i),a-^(i)). 

ttGR 
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Note that a^{i) = o'[G,r](7r(i)), and a{i) = (T,r(vr(i)). Recall that the defini¬ 
tion of the local risk is 




l{a[i) / a'{i)} 


Here recall Scj{^) = {cr' G r(d') : (iiy(iT, a') = d{a,a)} for any estimator 
a. It is obvious that there exists a one-by-one relation between 5o-(d'^) and 
(o'[G^]). For any a' G 5o-(d'^), there is a unique corresponding a” G 
5o-^(o'[G 7 r]) defined as (t"(z) = a' G [n], and the reverse also 
holds. Thus the event {cT(i) / is equivalent to {(T7r(7r(i)) / (T"(7r(i))}, 

and \Sa{d'")\ = \Sa^{d[GT,])\. We have 


Er{a{i),d^{i)) =E„ ^ 

cr"eSo-,r 


l{o-,r(7r(i)) / o-"(7r(i))} 


By the same argument as the previous one, together with the fact that 
IF’cr(G') = Pcr,,(G7r), wc cxpaud the expectation and then have 


Er(cj(i),d^(i)) = ^ ^ 

GeG u”&S„^{a[G^]) 




E^G) 




GeG a''es^^{a[G^]) 


E( E 

GeG a”&S^^{a[G]) 


\SaA^[GA)\ 

\S.A^[G])\ 


EaAG). 


Thus 


Er{a{i),d^{t)) =E^^ ^ 

a"£S^^ia[G^]) 


l{(J^(7r(z)) / (T''{TT{i))} 

\S.AHG])\ 


= Er{a^{Ti:{i)),a{TT{i))). 


This gives 

Er{a{i),d{i)) = ^ Er(cr^(7r(z)), 5-(7r(i))), Vi G [n], 

ttGR 
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Then for the local risk we have 

Br{a{i)) = X] (m ^ IEr(cj^(7r(i)), ci(7r(i)))) 

crSA ' ' Trsn 

ttsii ' ' o-eA 

ttsii ' ' O-eA 

= ^ (m ^ Er(cT(7r(i)), cr(7r(i)))) 

<tgA ' ' Tren 

1 1 ^ 

' ' crSA i=l 


where in the third equation we again use the fact that {a-ir : cr G A} is 
exactly equal to A for any vr. So we conclude Br{d'{i)) = Br{a{j)) for any 
i,j G [n]. Due to the equality 


Er((T, (t) = E inf 


l{((5od)(i) / o-(i)} 


2=1 


n 


O-'GSaiS') * —1 


l{i : a'(i) / cj(i)} 


n 




n 


n 


^—1 

n 


l*S'a(^)| 


2=1 


we have -Br(^) = Y17=i which leads to \\ii^BT-{a) = Br{a) = 

^^,-(^■(1)) > inf^-i?T-((T(l)). We omit the proof of the other direction of the 
equality stated in the lemma, which uses a nearly identical argument. The 
proof is complete. □ 


6.2. Proof of Lemma 5.1. First consider the case with K > 3. Define 
©I = ^ • ^(t(i) = 'l^\ + !}• So for each a G ©f, the 

community containing the first node always has size +1. We will show 
the ratio of the cardinality of ©f against that of ©^ is a constant. Denote 
xi = [n/A'jA'i and X 2 = ([n/ATj + l)Ar 2 ) then 


1 ©^ 



and [©f 


C 


/ n — 1 

- 1 
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where C is the number of combinations to select xi balls into Ki bins with 
size X 2 balls into K 2 bins with size + 1 , and another n — xi — X 2 
balls into bins with size — 1. Thus 

i©fi ^ ^ 

|0"i (”) " - ■ 

It is equivalent to the probability that the first node is assigned to the K 2 
bins with size + 1. Then 

^ Er(o-(l),d(l)) > ^ Er(cj(l),d(l)). 

'aeef 

For each (Tq G 0f, let ko(<7o) = <7o(l) be the index of community that the 
first node belongs to. And let /t(cJo) be the indices of communities whose 
sizes are , i.e., 

n{a,) = {k^[K]-.n,= ['^\}. 

Note that ko(<^o) ^ '^(o'o)- If we replace cJo(l) by any k G k((To) while keep 
the value of the rest of nodes, then we get a new assignment also contained in 
©f and has distance 1 from uq- In particular, we use the following procedure 
to get a new assignment it[cjo] based on o-q: 


o-N(i) 


min{A: G k((To) : k > ko(c’'o)} if maxK((To) > ko((To); 
minK((To) if maxK((To) < «:o(co)) 


and (T[cJo](i) = o'o(i) for all i > 2. It is clear that ct[(Jo] G ©f and d//(cJo, o'[(To]) = 
1. It is also guaranteed that for any ao,ai G ©f and cto / ui, the new as¬ 
signments are also different a[ao\ / cj[(Ti]. This leads to that ©'^ is equal to 
the set {cj[(To] : (Tq G ©'^}, and hence 


2Er(o-o(l),<T(l)) 


> 



^(Er(o-o(l),<T(l)) -bEr(cj[o-o](l),<T(l))). 


We are going to derive the Bayes risk inf^ 5 (Er(cJo(l), cf(l))-|-Er(cj[(To](I), (t( 1))) 
for a given G ©^. Let a be any estimator achieving the infimum. Since 
dni^'o, cr[ao]) = 1, we have r(cJo(l), (t(I)) = dH{o'o{l),d'{l)) and a similar 
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equation holds for cj[(To]- The estimator fT(l) can be interpreted as the Bayes 
estimator with respect to the zero-one loss. Then 5'(1) must be the mode of 
the posterior distribution. Let Jq to be the set {u G [ra]\{l} : o'o{u) = (To(1)}, 
and Ji = {u G [n] : cro{u) = (t[cjo](1)}. For a given adjacency matrix A, the 
conditional distributions are 


p(Aiao) = n 

uGJq 


n 


uGJi 




n 


n 


fiA' 


C\ 


and 


n n n n 

u£Ji uGJq 


Here A^ consists all the rest of the entries; A^ = {{u, v) : v > u > 2, ov u = 
1 and V ^ JqU Ji}. It is obvious that f{A^') is invariant to the choice of ao 
or (t[(To]. Thus 


\cj[uo](l), if Eu&JoAi,u < Eu&j^Ai,u. 

Thus Er(ao(l),d(l)) = P.o (E.eJo < EueJ, Ai,u) > > 

T„), and Er(cj[uo](l),d(l))) = E^[ao] (E«eJo > EneJi > 

T( Xu > Tu). Consequently, 

ln/K\ [n/K\ 

-(Er(o-o(l),d(l))+Er(o-[cJo](l),<T(l))) > P(^ ^ Xu > ^ Tu). 

U=1 U=1 

The above inequality holds for each uo G 0^. Hence 

infH^(d(l)) >inf ^(Er(cJo(l),<T(l))+Er(cj[o-o](l),<T(l))) 

- Y1 +Er(o-[cJo](l),5-(l))) 

^ (ToG©j 

ln/K\ ln/K\ 

>eP( X„> Y 

U=1 U=1 


For the case K = 2, we re-define Qf and show that its cardinality is 
same with that of 0^ up to a constant factor. (1) If ^ [|J, then define 
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©f = {(^,{0^:,}) e ©^ : n,(i) = Til}- Then \Qf\/\e^\ = 1/2. (2) If 
I = [|J, then define ©f = {(cr, {©i^j}) U ©^ : ^o-(i) > f }• Then 


l©fl 1 |Q^\©f 

0L |0L| 


= 1 - 




^ ^ _ (ra/2 - l)/n ^ 


1 + 


n/2 

n/2+1 


1 

2‘ 


Then with exactly the same argument used for K > 3 we finish the proof. 


6.3. Proof of Lemma 5.2. (1) First consider the case when —)• oo. 

Let p{x) be the probability mass function of Zj, and M{t) be the moment 
generating function of Zj. That is 


M(t) = Ee*'^*Ee 


/ 4- 0 ^ \ / i ^ ^ \ 

= (e —hi-) (e —hi-) 

' n n n n 


The minimum of M{t) is achieved at t* = | log , with M{t*) = 



^)(1-^))". This gives/ 


logM(t*) = maxt(—logM(t)). 


Let 5 be a positive number which may depend on n. Denote Sn' = 
Then 


n' 

^{Sn' > 0 ) > ^ YlpiZi) 

n'S>S^i>0 i=l 

^ -P|- exp{t*Zi)p{zi) 

~ exp(nT*(5) -I--I- Mit*) ’ 


where we use the fact that ex.p{n't*6) > exp(t*^-Zi) > ]/[exp(t*Zj) when 
< n'6. Denote q{w) = ~ Then 


nSn' > 0 ) > 


exp(nT*(5) 


n' 

n'S>S.^/>0 i=l 


n' 

= exp (—n^/) exp(—n't*(5) 

n'S>S^/>0 i=l 


Note that q{w) is a probability mass function, as Ylw ~ Tet 

ITi, VF 2 ,..., VFn' be i.i.d random variable with probability mass function 
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q{w), then 


IF’(5'n' > 0) > exp {—nl) exp(—> 


1 

n' 


Y,Wi>oy 


A closer look on Wi gives 

r(„, = 1) = r(vy. = -1) = 

and P(hFi = 0) = 1 — ¥’{Wi = 1) — P(hFi = —1). Thus ElTi = 0 and 


Var(lTi) 


2 lab _ a 
M(t*)Vnn^ 


b 


n 


Denote V = Var(^”^^ Wi/n') = Var(lTi)/n'. We will later show that 
I/{t*W) —^ oo. Now if it holds, then define <5 = V. It satisfies 
y/V = o((5). Chebyshev’s inequality yields 


T ^ T/^ 

lE"'‘ ads;s=»(i). 


n' 


2=1 


By the fact that the distribution of ^ ^11=1 symmetric, we have 




n 


I Z-^ 

i=l 


> 6 


i=l 



To prove I / {t*VV) —)• oo, first consider the case with a b. Since I ^ {a— 
bf/na, and t* = \ log((l + ^)(1 + )) - and \/y x VaKin, 

we have ^ co, implied by the fact ^ x —)■ oo. On the 

other hand, if a/b —)■ co (recall we assume 6 > e > 0 and a/n < 1 — e), 
we have I x a/n and M{t*) x 1. Note that (log^)(^)4 goes to 0, hence 
t*\/V = o{VaK/n). Then \/a/K. Since nl/K x a/K —)■ co, ^y^ 

also goes to inhnity. 

(2) If ^ = 0(1), we can choose 5 such that nt*5/K is also a constant. 
Then by considering the case a ^ b and a/b ^ co separately, we have 
~Jv ^ ntWv ^ ^ with a similar argument used above. Thus P(S'„/ > 0) is 
a constant. 
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6.4. Proof of Lemma 5.3. Due to the symmetry between a and do (both 
are in the same parameter space), we have a{a; do) = 7(do; d) and 7(d; do) = 
a(do;d). It is sufficient to get the desired lower bound for 7 (d;do), as the 
same bound automatically holds for a{a; do). 

By the definition of 0*^ there must exist a r/i —)• 0 such that | — 11 < rji 

for every k G [K]. First consider m < Without loss of generality, let d 
satisfy 

k—1 k 

a{i) = k,yi€ [^n'- + 

i=i i=i 

where are the sizes of communities in a. Recall {nk}^^i are the 

true community sizes in do. Define = |{f : a{i) = /c,do(i) 7 ^ /i:}|, then 
m = mfc. For k G [K], define 


7 fc(o-;o-o) = |{(u j) : cr{i) = a{j) = k,ao{i) 7 ^ ao{j),i < j}\ 


k-l 


: do(i) 7 ^ do(j),^n'- + l<i<j< 


j=i 




Obviously 7 (d;do) = 7fc(<7; <7o)- We have 7 fc(d;do) > |{i : d(i) = 

k,ao{i) = k}\\{i : a(i) = k,ao{i) 7 ^ A;}| = (n^ - mk)mk. Then 

, , v-^ / ^ (1 —o(l))mn v-^ 9 (1 — o(l))mn 9 

licr; do) > ^ mkijik - mk) > - — - ^ml> -—- m^. 


Now consider the case m > Define rrik^k' = \{i ■ = k,ao{i) = k'}\ 

for any k,k' G [K]. It is obvious that equations mk = '^k'^k''^k,k' , = 

"ifc + T^k,k and Tik' = Ylk liold for any k and k'. 

It can be shown that we cannot find an pair of {k,k') such as k ^ k' and 
mk,k' > Otherwise, if mk^y > then my^y < ny - mk,y < 

■ Then we can exchange the label of k and k' to get a new estimation 
a'. Compared with d, this helps correctly recover at least mk^y — {n'^ — 
'i^k,y) — ^y,y > 0 nodes. Since a' G r(d), then m = d{ao, a) < dnicro, a') < 
m, which leads to a contradiction. 

So we have mk^y < k ^ k'. For a given m^, we have 

7fc(^;^o) ^ U^k-Ek' m\y) 
n'j^ruk n'^ruk 
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with a constrain ■ When it can be shown 

that 

7fc(^;<^o) ^ li^k -ml) ^ Wfe - mfc ^ (1 - 5r?i)^ 

n'^m-A; “ nlnik “ 3n'^ 

And when nik > > 

7fc(tT;o-o) ^ - K - - (rn-fc - 

n'^mfc “ n'^mfc 

^ - mk) + ~ 

^ 2(l-5r?i)^ 

9< 

Then sum up over all k and we get 7((T;cro) > '^iki^npIEU _ gy choosing 
rj = 5 t]i the proof is complete. 

6.5. Proofs of Proposition 5.1 and 5.2. We first present proposition 6.1, 
which is easy to be verified by coupling. It is helpful for the proof of Propo¬ 
sition 5.1. 

Proposition 6.1. Let a and 7 be arbitrary positive integers, and m take 
any value in M. Define series of independent variables {W}f=i 7 

and {Pj}7=i. Let Ui ~ Ber{pi), Vi ~ Ber{qi), Xi ~ Ber{p) and 
Yi ~ Ber{q) with minpj > p and maxgj < q. Then 

Q; 7 a 7 

P^m + ViJ < P^m + 

2 = 1 2 = 1 2 = 1 2=1 

Proof of Proposition 5.1. Let {W}i<i<7 be i.i.dBer(^) random vari¬ 
ables and {Yi}i<i<a be i.i.d Ber(^) random variables, and {X^} T {Ej}. 
Then by Proposition 6.1, we have 


7 a 

P(r(u) >T(ao)) <P( E^*-E Ei> A(7-«)). 

2=1 2=1 
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As an application of Markov inequality, 


7 a 

^{T{a) > T{ao)) < P(^exp (t'^Xi - > exp(tA(7 - 

< (Ee^^^y (Ee~^'^^y 




{l—w)a+w'y 


(Ee-*^i)"' ® ) 


holds for any t > 0. Choose t = t*. Then Ee^^^^Ee = e and 
is exactly equal to 1. Thus E(T(cr) > T{ao)) < e“ ““{“’M-f, 

□ 


Proof of Proposition 5.2. Without loss of generality we assume that 
dnicr, (To) = d{a, cto). Then a assigns m nodes with different values from ao, 
and there are K possible values for each node. Thus 


|r : 3cr G r s.t. 



In addition, since each node has at most K possible choices, we have a naive 
bound for the cardinality of P as |{r}| < AT"'. □ 


SUPPLEMENTARY MATERIAL 

Supplement A: Supplement to “Mimimax Rates of Community 
Detection in Stochastic Block Models” 

(url to be specihed) . In the supplement [30] , we provide proofs for Theorems 
2.1 and 3.1, which extend the minimax results of Theorems 2.2 and 3.2 to a 
larger parameter space 0. 
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SUPPLEMENT TO “MIMIMAX RATES OF COMMUNITY 
DETECTION IN STOCHASTIC BLOCK MODELS” 


BY Anderson Y. Zhang and Harrison H. Zhou 
Yale University 

APPENDIX A: ADDITIONAL PROOFS 
In this appendix we provide the proofs of Theorem 2.1 and Theorem 3.1. 


A.l. Proof of Theorem 2.1. 

(1) For K = 2, the least favorable case for 0 is still 0*^. The proof is 
identical to that of Theorem 2.2. 

(2) For A = 3, it is always possible to have cr G 0 such that a constant 
proportion of communities have size , and another constant proportion 
have the same size [^ 1 , with the rest communities have much larger size. 
Define 0^ to contain all such a. Then with identical arguments used to 
establish Lemma 5.1 and Lemma 5.2 we have 

inf supEr((T, d) > inf sup 
“■ © o-e©^ 

ln/f}K\ In/PK\ 

>cP( ^ ^ Yu) 

U=1 U=1 

> exp(—(1 + o{l))nI/j3K). 


A.2. Proof of Theorem 3.1 {K = 2). Without loss of generality 
we assume | throughout this section. For arbitrary a, (Tq G 0 with 

d{a, (To) = m, we can define a{a] (Tq) and 'y(a; do) the same way as in Section 
5.2. Note that m < ^ since d{a,ao) = min{(i//((T,cjo),n — dH{o',ao)}. By 
Proposition 5.1, we have 


F{Tia) > T{a)) < P U > A (7 - a) 


^ 2 = 1 


2=1 


Xi ~ Ber(-),U ~ Ber(-) 
n n 


Note that in A = 2 we have a specific equality as a + 7 = m{n — m). Recall 


• f exp(-t*)+l- 


that A = 


— 5 ^). By the Chernoff bound, 


P(T(a) > r((To)) < 

Ee**^i 


= (E. 
= exp 


t Ajjgg t 2 

m(n — m)I 




Ee-**U 


m(n —m) 
7 2 
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where we use = exp(—I) and ■ The proof is 

similar to that of Theorem 3.2. Here we only include the key technique and 
omit the details. Assume 0 < e < 1/8. Consider the following three cases: 

(1) If nl/2 > (1 + e)logn, define mo = 1 and m! = en/2. Then Pi < 
nexp(—(n — l)//2). Denote R = nexp(—(n — 1)1/2). We have 

(2^)m exp(— '"*■"' 2 ™'^'^ ) < for rriQ <m<m' 

(^)mexp(-^) < Pexp(-'l^ll^), for m' < m < njl. 

Then nEr{a, a) < X]m=i = (1 + o(l))P. 

(2) If nI/2 < (1 — e)logn, define mo = nexp(—(1 — e“'^"^'^/^)n//2) and 
m' = nexp(—n//8). We have 


P™ < 


Pm < 


^ mo < m < m', 

(^)’”exp(-^^) < exp(-^^), for m' < m < n/ 2 . 


4 > — 16 

Then Er(cj, 0) < m^ln + Em>mo = (1 + o(l))mo/n. 
^' 5 ’/ If 2loir? I’ there exists a positive sequence tc 


0 such that 


nl 


Pm < 


21 ogn ^ uiicic c^iDuo a puoiuivc DC 4 UC 1 H..C uj —r u liiciu l 21 ogn 

1| <C tc and Define mo = nexp(— (1 — w)nI/2) and m' = w'^n. 

'(g)mexp(- "^(’^-"^'T ) < mo < m < m' 

(C)mexp(_i^) < exp(-^^), for m' < m < n/2. 

Then Er{a,a) < nio/n + Ylm>mo Pm = {f + o(l))mo/n. 

A.3. Proof of Theorem 3.1 {K > 3). For the upper bound, we need 
the following lemma in replace of Lemma 5.3. Other than that, the proof is 
identical to that for Theorem 3.2 and thus omitted. 

Lemma A.l. Assume 1 < /3 < \j\- Let a € Q be an arbitrary assign¬ 
ment satisfying d{a, uq) = m, where f) < m < n is a positive integer. Then 


( nm _ 2 jf jy. 

a(a;ao)A7(a;uo)><|gi m , */m _ 


K 


-, ifm > 


, _ ( 5 - 3 / 32)2 

where eg — 2/3(1+3(5-3,32)2) • 

Proof of Lemma A.l. It is sufficient to show the equality for 7((t; cjo). 
First consider the case m < 2^- Without loss of generosity, let a satisfy 


a{i) = k,'ii € 


k-l 

lj=i j=i 


k 

^ n'- 
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Here are sizes of all communities in a. Assume dH{cr,o'o) = m, then 
m = \{i : a(i) / cJo(i)}|. Define mk = |{i : (t(z) = k,o'o(i) / A:}| then 
m = J2k For k G [K], define 

7fc(fr;fTo) = \{{i,j) : a{i) = a(j) = k,ao{i) / (7o{j),i < j}\ 

k—1 k 

= [(iJ) ■ croii) + fToO'),^n'- + 1 < i < j ' 

i=i j=i 

We see that 7 (cj;iTo) = XliLi ^o)- We have mk < rind 

also 7 fc(cj;cJo) > |{f : a{i) = k,ao{i) = A:}||{i ; iT(f) = k,ao{i) / k}\ = 
mk{nk - mk). Then 


7(tr; fro) > ^ mkiuk - mk) > ^ 

k F 


Now consider the case that m > 2^- Define mk,k' = |{* : fr(i) = k,ao{i) = 
/c'}| for any k,k' G [K], We see that equations mk = Ylk'^k'^^k^k’ and 
rifc = kUk + mk^k and ny = Yhk ''^k,k' hold for all k, k' G [K]. 

For each k G [K], we want to get the value of 7 a:(o'; cjo). We divide k G [K] 
into the following three categories; 

( 1 ) We say A: G /Ci if for all k' 7^ k, mk,k' < For a given mk, we have 

7fc(fr;ao) ^ U^'j-Jlk' ^jk') 
n'^mk n'f^mk 

with mk = J 2 k'j^k ki^k,k'. When mk < |n),, it is easy to check 

7 fc(fr;fTo) ^ - {n'k- mk)"^ - ml) ^ - mk ^ 1 

n'kmk ~ n'^mk n'^ “3' 


When mk > 


7 fc(fr;cro) 

n'l^mk 


U'^k - (K - ^k)^ - (mk - Inl)^ 
n'^mk 

mkjnl - mk) + K(mfc - |n).) 
n^mk 




Thus 7fc(ci';cJo) > 157^ in both cases. 
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( 2 ) We say A: G /C 2 if exists k' ^ k such that ruk^k' > Claim mk’,k' > 

\n'k- Otherwise, from a we can exchange the labels k and k' to get a new 

estimator a'. This helps to correctly recover at least mk^k' ~ T^k,k ~ 'i^k',k' > 

\n'j^—^n'k — \n'k > 0 more nodes. Since a' G r((T), this implies m = d{aQ, a) < 

(i//((To, a') < dnicTo, a) = m, which leads to a contradiction. 

On the other hand, we have nik' = n'^ — fUk'^k' > n'y — {uk' — ruk^k') > 

JH _ §11 ^ 2^ > (5-3/3^)n Q rpi ■ 

13K K ^ 3/3A — 3/3A ^ implies 

7fc(q';tTo) +7fc'(o';o-o) ^ 7k'{cr;ao) ^ mk',k'mk' ^ \n',^ ^ sIa 

ikik + fnu ~ ruk + mu ~ mk + mu ~ + 1 ~ ^1 1 

K < K K < K K < K my ' ( 5 - 3 , 3 ) 2 /( 3 / 3 ) ^ ^ 

Thus we have 7fc(cr;<7o) +lk'{cr;ao) > _ 

Apparently [K] = /Ci U /C 2 and /Ci H /C 2 = 0- Claim for any /c G /Ci, there 
exists at most one k' ^ k such that mk'^k > Otherwise if there exists 

another k" ^ k' such that k" 7 ^ k and my^k > Since k', k" G IC 2 , this 

leads to mk,k > 5 (^1 V n'^.,). Then Uk > mk^k + my ^k + mk^k > n'y+ ‘^n'y, > 
^ which leads to a contradiction. Note that < |. Thus 


7 ( 0 -; o'o) 


\ 27fc(t^;f^o) 

A:e[A] 


> 


> 



\fceK:i 

cpnm 


2nmk 

9fdK 


+ E 

keK.2 


2ci3nmk \ 

~ 


K 
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